Unsupervised Feature Selection for Multi-View Data in Social Media

نویسندگان

  • Jiliang Tang
  • Xia Hu
  • Huiji Gao
  • Huan Liu
چکیده

The explosive popularity of social media produces mountains of high-dimensional data and the nature of social media also determines that its data is often unlabelled, noisy and partial, presenting new challenges to feature selection. Social media data can be represented by heterogeneous feature spaces in the form of multiple views. In general, multiple views can be complementary and, when used together, can help handle noisy and partial data for any single-view feature selection. These unique challenges and properties motivate us to develop a novel feature selection framework to handle multi-view social media data. In this paper, we investigate how to exploit relations among views to help each other select relevant features, and propose a novel unsupervised feature selection framework, MVFS, for multiview social media data. We systematically evaluate the proposed framework in multi-view datasets from social media websites and the results demonstrate the effectiveness and potential of MVFS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Unsupervised Multi-view Feature Selection for Visual Concept Recognition

To reveal and leverage the correlated and complemental information between different views, a great amount of multi-view learning algorithms have been proposed in recent years. However, unsupervised feature selection in multiview learning is still a challenge due to lack of data labels that could be utilized to select the discriminative features. Moreover, most of the traditional feature select...

متن کامل

Linked Unsupervised Based Advanced Feature Selection Framework with Artificial Bee Colony for Social Media Data

The explosive usage of social media produces large amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. Existing work investi...

متن کامل

Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use Prediction

In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different unsupervised feature learning methods to take advantage of a large amount of unsupervi...

متن کامل

Social Media-based Substance Use Prediction

In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different feature learning methods to take advantage of a large amount of unsupervised social me...

متن کامل

A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013